Corpus-Driven Generation of Weather Forecasts
نویسنده
چکیده
In traditional natural language generation (NLG), careful analysis of a corpus of example texts and determining the single correct sublanguage behind it is seen as one of the main tasks of the NLG system builder. In practice, this often means elimination of variation in the corpus and specification of conditions for rule application to the point where an NLG system becomes (virtually) deterministic. This approach is time-consuming, does not apply objective criteria for deciding what is correct, and contributes to the lack of robustness and reusable components in NLG. Moreover, with variation regarded as a ‘bug’ to be eliminated, systems run the risk of implementing a subjective or restrictive view of the domain sublanguage. This research note argues that relative frequency provides an objective, easily applicable tool for dealing with corpus variation. The probabilistic approach to NLG can also help cut down on manual corpus analysis, make systems more robust and components more reusable. A methodology is described that combines use of a base generator with a separate, automatically adaptable, probabilistic decision-making component. Three different decision-making techniques are compared and evaluated with a focus on their ability to model idiolectal variation.
منابع مشابه
Building a Parallel Spatio-Temporal Data-Text Corpus for Summary Generation
We describe a corpus of naturally occurring road ice weather forecasts and the associated weather prediction data they are based upon. We also show how observations from an analysis of this corpus have been applied to build a prototype Natural Language Generation (NLG) system for producing road ice forecasts. While this corpus occurs in a narrow domain, it has much wider applicability due to th...
متن کاملExploiting a parallel TEXT - DATA corpus
In this paper, we describe SUMTIME-METEO, a parallel corpus of naturally occurring weather forecast texts and their corresponding forecast data; data that the human authors inspected while writing the forecast texts. We have analysed the corpus to acquire knowledge needed to build a text generator for automatically producing textual weather forecasts from numerical weather prediction data. Alth...
متن کاملA Ridge Moving East across the North Sea This Evening . a Vigorous
In this paper, we describe SUMTIME-METEO, a parallel corpus of naturally occurring weather forecast texts and their corresponding forecast data; data that the human authors inspected while writing the forecast texts. We have analysed the corpus to acquire knowledge needed to build a text generator for automatically producing textual weather forecasts from numerical weather prediction data. Alth...
متن کاملChoosing words in computer-generated weather forecasts
One of the main challenges in automatically generating textual weather forecasts is choosing appropriate English words to communicate numeric weather data. A corpus-based analysis of how humans write forecasts showed that there were major differences in how individual writers performed this task, that is, in how they translated data into words. These differences included both different preferen...
متن کاملSublanguage Engineering In The Fog System
FoG currently produces bilingual marine and public weather forecasts at several Canadian weather offices. The system is engineered to reflect "good professional style" as found in human forecasts. However, some regularization and simplification of the output has been needed. Sublanguage engineering issues include tradeoffs in coverage and style, handling variation and evolution of sublanguages,...
متن کامل